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Abstract — The goal of object tracking is segmenting a region 
of interest from a video scene and keeping track of its motion, 
positioning and occlusion. The object detection and object 
classification are preceding steps for tracking an object in 
sequence of images. Mean shift algorithm is recently widely 
used in tracking clustering, etc. First phase of the system is to 
detect the moving objects in the video. Second phase of the 
system will track the detected object. In this paper, detection of 
the moving object has been done using simple background 
subtraction and tracking of single moving object has been done 
using modified mean shift method and Kalman filter. Further 
result of both algorithm is compared on basis on time and 
accuracy. 

Index Terms — object tracking; kalman filter; mean shift 
method.. 


developing a more general formulation and demonstrating its 
potential uses in clustering and global optimization. Since 
then, mean shift has been widely used in object tracking [3 -7], 
image segmentation^, 9], pattern recognition and 

clustering[10,ll], filtering [12], information fusion[13] and 
etc. 

Kalman filter is an optimal Recursive Data Processing 
Algorithm. It consists of the following two phases- (i) 
prediction and (ii) correction. The first refers to the prediction 
of the next state using the current set of observations and 
update the current set of predicted measurements. The second 
updates the predicted values and gives a much better 
approximation of the next state. It attempts to achieve a 
balance between predicted values and noisy measurements. 
The values of the weights are determined by modeling the 
state equations. 


I. Introduction 

Object tracking is an important task within the field of 
computer vision. The proliferation of high-powered 
computers, the availability of high quality and inexpensive 
video cameras, and the increasing need for automated video 
analysis has generated a great deal of interest in object 
tracking algorithms. There are three key steps in video 
analysis: detection of interesting moving objects, tracking of 
such objects from frame to frame, and analysis of object 
tracks to recognize their behavior. Therefore, the use of 
object tracking is pertinent in the tasks of motion-based 
recognition, that is, human identification based on gait, 
automatic object detection, etc; and in traffic monitoring, that 
is, real-time gathering of traffic statistics to direct traffic flow. 
In its simplest form, tracking can be defined as the problem of 
estimating the trajectory of an object in the image plane as it 
moves around a scene. In other words, a tracker assigns 
consistent labels to the tracked objects in different frames of a 
video. Additionally, depending on the tracking domain, a 
tracker can also provide object-centric information, such as 
orientation, area, or shape of an object. Tracking objects can 
be complex due to loss of information caused by projection of 
the 3D world on a 2D image and noise in images, Mean shift, 
which was proposed in 1975 by Fukunaga and Hostetler[l], is 
a nonparametric, iterative procedure that shifts each data to 
local maximum of density function. In spite of its good 
properties, it has been ignored until Cheng’s paper [2] renews 
our interest in it. Cheng in [2] revisited mean shift, 
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n. LITERATURE SURVEY 

Cheng [19] introduced Mean shift algorithm to the field of 
computer vision. In his paper he has briefly about Mean shift, 
Mean Shift is a simple interactive procedure that shifts each 
data point to the average of data points in its neighborhood is 
generalized and analyzed in the paper. This generalization 
makes some k-means like clustering algorithms its special 
cases. It is shown that mean shift is a mode-seeking process 
on the surface constructed with a “shadow” kernal. For 
Gaussian kernels, mean shift is a gradient mapping. 
Convergence is studied for mean shift iterations. Cluster 
analysis if treated as a deterministic problem of finding a 
fixed point of mean shift that characterizes the data. 
Applications in clustering and Hough transform were 
demonstrated. Mean shift is also considered as an 
evolutionary strategy that performs multistate global 
optimization. 

Bradski [20] modified Mean Shift Algorithm developed by 
Cheng [19], and developed the Continuously Adaptive Mean 
Shift (CAMSHIFT) algorithm for face tracking. As a first 
step towards a perceptual user interface, a computer vision 
color tracking algorithm was developed and applied towards 
tracking human faces. Computer vision algorithms that are 
intended to form part of a perceptual user interface must be 
fast and efficient. They must be able to track in real time yet 
not absorb a major share of computational resources: other 
tasks must be able to run while the visual interface is being 
used. The new algorithm developed here was based on a 
robust non-parametric technique for climbing density 
gradients to find the mode (peak) of probability distributions 
called the mean shift algorithm. In his case, they want to find 
the mode of a color distribution within a video scene. 
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Therefore, the mean shift algorithm was modified to deal 
with dynamically changing color probability distributions 
derived from video frame sequences. The modified algorithm 
was called the Continuously Adaptive Mean Shift 
(CAMSHIFT) algorithm. CAMSHIFT’ s tracking accuracy 
was compared against a Polhemus tracker. Tolerance to 
noise, distracters and performance was studied. CAMSHIFT 
was then used as a computer interface for controlling 
commercial computer games and for exploring immersive 3D 
graphic worlds. Comaniciu and Meer successfully applied 
mean shift algorithm to image segmentation [21] and object 
tracking. They have developed a new method for real time 
tracking of non-rigid objects seen from a moving camera was 
proposed. The central computational module is based on the 
mean shift iterations and finds the most probable target 
position in the current frame. The dissimilarity between the 
target model (its color distribution) and the target candidates 
were expressed by a metric derived from the Bhattacharyya 
coefficient. The theoretical analysis of the approach shown, 
that it relates to the Bayesian framework while providing a 
practical, fast and efficient solution. The capability of the 
tracker to handle in real time partial occlusions, significant 
clutter, and target scale variations was demonstrated for 
several image sequences. Comaniciu and Meer [22] then 
modified their approach and developed a general 
non-parametric technique for the analysis of a complex 
multimodal feature space and to delineate arbitrarily shaped 
clusters. The basic computational module of the technique 
was an old pattern recognition procedure: the mean shift. For 
discrete data, they proved the convergence of a recursive 
mean shift procedure to the nearest stationary point of the 
underlying density function and, thus, it’s utility in detecting 
the modes of the density. The relation of the mean shift 
procedure to the Nadaraya- Watson estimator from kernel 
regression and the robust M-estimators; of location was also 
established. Algorithms for two low-level vision tasks 
discontinuity-preserving smoothing and image segmentation 
were described as applications. In those algorithms, the only 
user-set parameter was the resolution of the analysis, and 
either gray-level or color images are accepted as input. 
Extensive experimental results illustrated their excellent 
performance. 

Comaniciu et. al.[23], developed Vision based tracking, it 
was a challenging engineering problem is one of the hot 
research areas in machine vision. At that time Kernel based 
tracking using Bhattacharya similarity measure was shown to 
be an efficient technique for non-rigid object tracking through 
the sequence of images. In their paper they presented a robust 
and efficient tracking approach for targets having larger 
motions as compared to their sizes. Their tracking approach 
was based on calculating the Gaussian pyramids of the 
images and then applying mean shift algorithm at each 
pyramid level for tracking the target. Model based tracking 
often suffers abrupt changes in target model, which is 
compensated by the model updates of target. This leads to a 
very efficient arid robust nonparametric tracking algorithm 
the new method was easily able to track the fast moving 
targets and is more robust and environment independent as 
compared to original kernel based object tracking. 

Collins R [24], the mean-shift algorithm is an efficient 
technique for tracking 2D blobs through an image. Although 
the scale of the mean-shift kernel was a crucial parameter, 
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there was presently no clean mechanism for choosing or 
updating scale while tracking blobs that are changing in size. 
He adapted Lindeberg's (1998) theory of feature scale 
selection based on local maxima of differential scale-space 
filters to the problem of selecting kernel scale for mean- shift 
blob tracking. He had shown that a difference of Gaussian 
(DOG) mean-shift kernel enables efficient tracking of blobs 
through scale space. Using this kernel requires generalizing 
the mean- shift algorithm to handle images that contain 
negative sample weights. 

Zivkovic Z. and Krose B [25], the iterative procedure called 
'mean-shift' is a simple robust method for finding the position 
of a local mode (local maximum) of a kernel-based estimate 
of a density function. A new robust algorithm was developed 
that presented a natural extension of the 'mean- shift' 
procedure. The new algorithm simultaneously estimates the 
position of the local mode and the covariance matrix that 
describes the approximate shape of the local mode. They 
applied the new method to develop new 5 -degrees of freedom 
(DOF) color histogram based non-rigid object tracking 
algorithm. 

Kalman filter technique is used to estimate the state of a 
linear system where state is assumed to be distributed by a 
Gaussian [18]. In 1960, R.E. Kalman [14] published his 
famous paper describing a recursive solution to the 
discrete-data linear filtering problem [1]. Object tracking is 
performed by predicting the object's position from the 
previous information and verifying the existence of the object 
at the predicted position. 

Secondly, the observed likelihood function and motion model 
must be learnt by some sample of image sequences before 
tracking is performed [15]. The Kalman filter is a set of 
mathematical equations that provides an efficient 
computational (recursive) means to estimate the state of a 
process in several aspects: it supports estimations of past, 
present, and even future states, and it can do the same even 
when the precise nature of the modelled system is unknown 
[16-17]. The Kalman filter estimates a process by using a 
form of feedback control. The filter estimates the process 
state at some time and then obtains feedback in the form of 
noisy measurements. The equations for Kalman filters fall in 
two groups: time update equations and measurement update 
equations. The time update equations are responsible for 
projecting forward (in time) the current state and error 
covariance estimates to obtain the a priori estimate for the 
next time step. The measurement update equations are 
responsible for the feedback. That is used for incorporating a 
new measurement into the a priori estimate to obtain an 
improved a posteriori estimate. The time update equations 
can also be thought of as predictor equations, while the 
measurement update equations can be thought of as corrector 
equations. 


IE. METHODOLOGY: 

A. Modified Mean Shift Tracking (MMST) Algorithm 
1. Initialization: calculate the target model q and initialize the 
position y 0 of the target candidate model in the previous 
frame. 

The probability of the feature u (u= 1, 2... m ) in the target 
model is computed as [9] . 
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Where q is the target model, q^ is the probability of the u th 
element of q , 5 is the Kronecker delta function, b{Xf} 
associates the pixel X* to the histogram b in , and k{x) is an 
isotropic kernel profile. 

2. Initialize the iteration number k <— 0. 

3. Calculate the target candidate model pCvu j in the current 
frame. 

The probability of the feature u in the target candidate model 
from the candidate region centered at position y is given by 


p(y) = {A, (y 
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Where C is constant 

4. Calculate the weight vector [w T i} fcl ...„ 
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8. Determining the Candidate Region in Next Frame 
Once the location, scale and orientation of the target are 
estimated in the current frame, we need to determine the 
location of the target candidate region in the next frame. With 
Eq. (9), we define the following covariance matrix to 
represent the size of the target candidate region in the next 
frame 


Cov 2 = Ux 


( a + Ad)~ 
0 


(/> + Ad)' 
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( 10 ) 


where Ad is the increment of the target candidate region in 
the next frame. The position of the initial target candidate 
region is defined by the following ellipse region 


(x - y, ) x Cov 2 1 x (x - y, ) T < 1 
(ll) 
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5. Calculate the new position y x of the target candidate model 
In the mean shift iteration, the estimated target moves from y 
to a new position y u which is defined as 



6. Let d <— || yi-y 0 II, yo <— yi.Set the error threshold s (default 

0.1) and the maximum Iteration number N (default 15). 

If (d < s or k > N) (5) Stop and 

go to step 7 ; 

(7 th&rw ise k<— k+ 1 , and 

go to step 3. 

7. Estimate the width, height and orientation from the target 
candidate model (Cov) 

Bhattacharyya coefficient can be used to adjust M 0 o in 
estimating the target area, denoted by A 

^ ~ ( P ) ^ 

Where c(p) is a monotonically increasing function with 
respect to the Bhattacharyya coefficient p(0 < p < 1), M 0 o 
is estimated frame, If and X 2 height of the target that 

k = 

0 = 7 ^ 77 ^ 2 ) b = JkA 7 ( 7 r^j 

( 8 ) 

Now the covariance matrix becomes 


B. Algorithm of Kalman Filter For Object Tracking 

The Kalman filter is a set of mathematical equations that 
provides an efficient computational (recursive) means to 
estimate the state of a process, in a way that minimizes the 
mean of the squared error. The filter is very powerful in 
several aspects: it supports estimations of past, present, and 
even future states, and it can do so even when the precise 
nature of the modeled system is unknown [14]. 

The Kalman filter is the best filter among the subset of all 
linear filters and the best filter among the set of all filters 
when the noise processes are Gaussian type[15]. 

The Kalman filter is essentially a set of mathematical 
equations that implement a predictor-corrector type estimator 
that is optimal in the sense that it minimizes the estimated 
error covariance-when some presumed conditions are met 
[17]. 

Tracking of moving object has been done using Kalman filter. 
Here tracking of any object can be done by providing the 
frame number from which tracking has to be started. From the 
selected frame any object can be picked for tracking by 
setting the position of the mask and then the object can be 
tracked in subsequent frames. 

Following steps have been implemented for tracking a single 
object. 

1. Background frame has been calculated by taking average of 
all the pixels. 

2. Frame number has been selected from which tracking of 
any object has to be started. 

3. From selected frame object to be tracked has been selected 
by repositioning the mask. 
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4. For selected object its centroid position has been found out 
and from centroid information all the equation of time and 
measurement update have been calculated. For selected frame 
the actual position X and error P has been calculated. 

For all remaining frames following steps have been repeated. 

1. Background subtraction has been done to find out all the 
moving regions in the frame. 

2. From the found regions, region with the lowest distance 
from the region selected in previous frame has been selected. 

3. Selected region's centroid and other parameter have been 
used to calculate time and measurement update equations. 

4. Obtained state position values X has been stored in Array 
for every frame. 

5. Line joining each stored point has been drawn in every 
frame which shows the trajectory of the selected moving 
object. 




(c) 

Figure 2: (a,b,c) Tracking results of vehicle for Kalman 

algorithms. The frames 20, 40 and 80 are displayed. 


IV. RESULT ANALYSIS 



(c) 

Figure 1: (a, b ,c) Tracking results of vehicle using MMST 
algorithms. The frames 20, 40 and 80 are displayed. 



(a) 


Above figure shows the output on applying modified mean 
shift algorithm and kalman algorithm on different objects for 
tracking and time required for tracking is given in table 1 

Table 1: Comparison of MMST and Kalman Filer Object 
Tracking Methods 


s. 

No. 

Vide 

0 

No.o 

f 

Frames 

MM 

ST 

Kalm 

an 

Filter 

Time (in sec) 

1 

Carl 

300 

324 

244 

2 

One 

stop 

move 

720 

100 

73 


V. CONCLUSION 

Form figure 1 and 2 it is clear that both the algorithm can 
track the object clearly, table 1 shows the time required by 
both the algorithms for tracking of objects and it is very clear 
from the table that kalman filter generates a good output with 
respect to time. 
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